# README for computer programs for "Efficient GMM estimation with incomplete data" by Chris Muris

Bristol, Feb 21, 2019.

Comments and questions are welcome at chris.muris@bristol.ac.uk.

Files with prefix "sim-" contains source files and output for the simulation section (Appendix).
Files with prefix "trade-" are related to the empirical application in Section 6.

Before you proceed with the steps below, check that your machine has the following software installed:

- Stata, version >=12 (for the empirical application: the simulations do not require Stata)
- A working installation of R, version >=3.0.0 [https://www.r-project.org/]
- A working copy of RStudio, version >=1.1 [https://www.rstudio.com/products/rstudio/download/]
- Version of the following packages, with build dates >= August 1, 2018
	- tidyverse
	- haven
	- survival
	- AER
	- cowplot
	- gmm
	- plm
	- mgcv
	- Matrix

## trade-liberalization

To replicate the empirical illustration in Section 6, proceed as follows:

1. Extract the files from the archive, and move those starting with "trade-" into a new folder (the "working directory").
2. Obtain a copy of the file "prod_dataregression.tab" in the data archive used by Topalova and Khandelwal (2011), via REStat's Dataverse page [https://dataverse.harvard.edu/dataset.xhtml?persistentId=hdl:1902.1/18094]
3. Put that file in the working directory.
4. Locate the the .do file 'trade-01-load-data.do' in your working directory. Execute it in Stata, version >=12.
5. Check that the .do file created the .dta file 'trade_liberalization.dta' in your working directory.
6. Run 'trade-02-replication_incomplete.Rmd' to generate the results in Table 1
7. Run 'trade-03-replication_incomplete.Rmd' to generate the results in Table 2

Note: running '.Rmd' is best done using RStudio. Open the files and either "Run all" or "Knit to HTML"

## simulations

The files with prefix "sim-" correspond to the simulation studies in the Supplementary Material (Sections D.1 and D.2)

The files with '.Rmd' extension are R Markdown files [https://rmarkdown.rstudio.com/] that contain all the code to run the simulation study. Simply open the the '.Rmd' files in RStudio and "Run All". The results you obtain should be similar to, but not identical to the ones in the paper. The difference is due to simulation error. I have found that simulation error to be small given the choice of sample size and number of simulations.

- "sim-dpd_cohorts_design1.Rmd" produces
	1. "dpd_design_1.pdf", a PDF containing Figure D.1;
	2. "dpd_cohorts_design1.RData", an R data file. It contains an object "table_design1", which corresponds to Table 5.
- "sim-dpd_cohorts_design2.Rmd" produces 
	1. "dpd_cohorts_design2.RData", an R data file. It contains an object "result_design2" that the code chunk starting on line 499 turns into panels 1, 2, and 3 of Table 7. The code chunk starting on line 514 turns the object "ss_09_05_200" into panel 4 of that table.
- "sim-fixedeffects_binarychoice.Rmd" produces
	1. "binarychoice_noselection.pdf", a pdf file corresponding to Figure D.2;
	2. "binarychoice_selection.pdf", a pdf file corresponding to Figure D.3;
	3. "20180515_results_binaryFE.RData", an R data file containing the results of your simulation study.

The R Markdown files contain running commentary that provides more detail. Do not hesitate to contact me if you have further questions about the code in this archive.


